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Improvements of various methods to compute the sign function of the hermitian Wilson-Dirac matrix within 
the overlap operator are presented. An optimal partial fraction expansion (PFE) based on a theorem of Zolotarev 
is given. Benchmarks show that this PFE together with removal of converged systems within a multi-shift CG 
appears to approximate the sign function times a vector most efficiently. A posteriori error bounds are given. 



1. INTRODUCTION 

The overlap operator, D = l + r75 sign((3), sat- 
isfies the Ginsparg- Wilson relation and thus ex- 
hibits chiral symmetry at finite lattice spacing a 
(see [0 and references therein). However, due to 
the sign function of the hermitian Wilson-Dirac 
operator, Q, its numerical evaluation is extremely 
costly, with an overhead estimated to be at least 
a factor O(IOO) compared to Wilson fermions. 

In this status report of our ongoing interdis- 
ciplinary project, we demonstrate that well es- 
tablished methods to compute the sign function 
like Lanczos and multi-shift CG in combination 
with a partial fraction expansion (PFE/CG) can 
be improved substantially. We present bench- 
marks of Neuberger's PFE/CG method [|l|, an 
optimal PFE/CG method with reduced number 
of poles, a PFE- improved version of Borigi's Lanc- 
zos process for as well as the standard 
Chebyshev approximation. It turns out that the 
PFE/CG method with removal of converged sys- 
tems is most efficient. 

Furthermore, for error monitoring and termi- 
nation of iterations, we derive a posteriori error 
bounds of the approximation of the sign function 
for both Lanczos (in terms of the residual of a 
related CG-process) and PFE methods (in terms 
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of the residuals in the multi-shift solver). 

2. NUMERICAL PROBLEMS 

Computations involving the overlap operator, 
D, are characterized by two nested iterations, (i) 
the outer iterative solution of 

Dx = {l+r-f5sign{Q))x^b |r| < 1, (1) 

requiring (ii) an inner iteration for s 

s = sign(Q)6 (2) 

in each outer iteration step. 

Despite the fact that nested schemes are sub- 
optimal as information built-up for the sign func- 
tion is discarded after each iteration, they might 
still be superior to alternatives from 0. 

3. NUMERICAL METHODS FOR Sx 

3.1. Polynomial approximations for t^i 

These methods determine polynomials pk 
which approximate t~2 on [a^, 6^] with a < |A„i„| 
and b > \X,„^^\, the extremal eigenvalues of Q. 
The approximation to s = sign{Q)b is then 
s w Qpk{Q^)b. Polynomials that have been 
used are Chebyshev polynomials with linear con- 
vergence (error cx (f^)*^, n being the condition 
number of Q), Legendre polynomials as applied 
in Ref. Q , Gegenbauer polynomials as introduced 
by Bunk (Ref. [^) and Schulz polynomials (error 
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oc (fqij)'^^) "^hich will be presented in a forth- 
coming publication of our collaboration. 

3.2. Lanczos based methods 

The Lanczos process in matrix form reads 

QVk^ VkTk + Pk+iVk+iel, with V^b = ei, (3) 

where we assume ||6|| — 1. We refer to eq. (^) 
as "Lanczos for Q". Two ways have been pro- 
posed to approximate sign{Q)b by diagonaliza- 
tion of Tk'. 

sign{Q)b^QVk{T^y^e,. ([2]) (4) 

sign(g)6«14sign(Tfc)ei. ([6]) (5) 

The errors of both methods are highly oscillating 
as a function of k. For the second one, the peaks 
are bounded, however. In order to avoid such 
oscillations Borigi has introduced an alternative 
based on a Lanczos process on 

g2 Vk - VkTk + pk+ivk+iel, with V^b = ei. (6) 



sign{Q)b^QVkT^, 'ei. 



(7) 



The latter method ("Lanczos for Q^") shows a 
smoother convergence rate as well as a poten- 
tially smaller projected system Tk- However, in 
any case the spectral decomposition of Tk is com- 
putationally very costly. 

3.3. Partial Praction Expansion and multi- 
shift CG (PFE/CG) 

The elegant idea to use a fixed number of vec- 
tors by means of partial fractions expansions has 
been proposed by Neuberger W: 



sign((5)6«a; 



PFE_ ^~^ 
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The TO vectors x], are computed in step k of the 
multi-shift CG method j7| for the shifts r^. 

Two rational approximations so far have been 
applied in the context of the overlap operator: 
Neuberger's proposal (||l|): The coefficients 
are defined by 



tan^ 



^ J.C0S-2(-ZL(j_ 1)) 

771 V 2m V 2 // 
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In general, a large number m of poles is re- 
quired to achieve practical precisions. 
Remez algorithm (Edwards et al. [^): By 
use of the Remez algorithm, an optimal approx- 
imation g{x) to x^i in || • is con- 
structed, resulting in a substantially smaller num- 
ber of poles. However, the sign function is aprox- 
imated as x g{x'^) which is not the || • |joo-optimal 
approximation to sign(a;) in [— |A„ax|, — |A,„i„|] U 
[|A_|,|A_|]. 

4. IMPROVEMENTS 

4.1. Lanczos procedures 

As mentioned, the Lanczos approach might be 
slow since a diagonalization of Tk is required. 
As far as the "Lanczos on Q^" approach is con- 
cerned we propose to use a PFE, as detailed next, 
to compute a first approximation to the inverse 
square root of the full matrix (Tk). Based on this 
approximation, the Lanczos procedure is repeated 
to yield the final approximation to {Q^)~^b. 

4.2. PFE/CG 

The vector updates in PFE/CG play a signifi- 
cant role for large numbers of poles for practical 
implementations. Therefore, we seek for a reduc- 
tion of the number of poles to improve PFE/CG. 
In contrast to Ref. |^ we try to find a rational 
function f{x) that minimizes 

\\i-V^f{m\^-"'^--y (10) 

Thena;/(a:^) is the ||-||oo-optimal rational approx- 
imation of the sign function on [— |A„„^|, — |A„i„|]U 
[|A„,i„|, |A,„ax|]- By means of Zolotarev's theorem 
1^ f{x) can be given in analytic form: 

m - z, nt^'^°"> (11) 

where the coefficients can be expressed in terms 
of Jacobian elliptic functions: 

sn^{lK/2k;K) 



ci 



l-sn'^{lK/2k;K) 



, Z = l,...,2fc-l, (12) 



with yl 



and D being uniquely de- 



termined by the condition 



max [1 — ^/xf{x)] = — min [1 — ^/xf{x)]. 
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Table 1 shows that the method drastically reduces 
the number of poles, in particular for large con- 
dition numbers. 

Table 1 

Number of poles for precision 0.01. 



Am ill 


Neuberger 


Remez |g 


Zolotarev 


200 


19 


7 


5 


1000 


42 


12 


6 


100000 


> 500 


? 


10 



Another interesting idea is to remove converged 
systems from the multi-shift process early, as 
residuals for shifted matrices with large shifts r.; 
reduce more quickly. Under some restrictions on 
the quality of PFE we can show that for a total 
error of at most e we can stop updating system j 
after step k as soon as 



Kll< 



(13) 



5. A POSTERIORI ERROR BOUNDS 

The error in the PFE / CG method is composed 
of 2 parts. For a total error of at most e we de- 
mand: 



1. I sign(A) 



i=l 



A 



<e/2, 



ll.\\x^ 



Xk\\< e/2. 



(14) 



(15) 



One can prove that the total error < e, if the CG 
residual for the smallest shift satisfies 



< 



2 + e 



(16) 



For "Lanczos for Q^" it is worth noting that 
we also managed to get a posteriori error bounds 
which will be presented in a forthcoming paper. 

6. NUMERICAL EXPERIMENTS 

Our tests have been carried out on quenched 
16^ configurations at /? = 6.0 and m — —1.6 with 
the error for the approximation of the sign func- 
tion being < lO^^*'. The timings are from 16 
nodes of the Wuppertal cluster computer ALICE. 



Table 2 
Benchmarks. 



confs 


1 2 3 


4 


5 


|A„,„| ■ 10'' 


0.455 1.39 1.17 


2.23 


3.02 


|A_| 


2.48 2.48 2.48 


2.48 


2.48 


poles Neub. 


143 82 89 


65 


56 


poles Zolo. 


21 18 19 


17 


16 


Che by she V 


MVs 


9501 3501 4001 


2301 


2201 


timo/s 


655 247 278 


160 


154 


Lanczos/PFE 


MVs 


2281 1969 1953 


1853 


1769 


timo/s 


150 131 129 


124 


118 


PFE/CG Neuberger 


MVs 


Y 985 977 


929 


887 


time/s 


? 340 362 


274 


215 


PFE/CG Zolotarev without removal 


MVs 


1141 985 977 


927 


885 


time/s 


154 125 125 


116 


102 


PFE/CG Zolotarev + removal 


MVs 


1205 1033 1033 


971 


927 


time/s 


122 93 97 


87 


79 



7. OUTLOOK 

The benchmark results demonstrate that the 
PFE/CG/Zolotarev procedure with removing of 
converged systems turns out to be most effective. 

As a next step, we will tune the accuracy of 
the sign approximation within the solution of the 
outer problem, Dx — b (eq. (|l|)). Furthermore, 
we will investigate the effect of projecting out 
some low eigenvalues of Q onto our findings. 
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